Question: Is it possible to predict that tomorrow will be rainy or not?
Answer: Yes, we have reasonably enough data to predict.
And now we will dive into the given dataset and then show several different models for prediction.
As we said, before to start exploration of models, we need to be sure that dataset is appropirate for this purpose.
Generally yes, however, we still require some "cleaning" and as a first step we are going to fix it.
Installing ydata_profiling package. As we will use this package later for Exploring our Data.
%pip install ydata_profiling numpy pandas matplotlib seaborn scikit-learn PyQt6 ipykernel
%matplotlib inlineRequirement already satisfied: ydata_profiling in /opt/homebrew/anaconda3/lib/python3.12/site-packages (4.16.1)
Requirement already satisfied: numpy in /opt/homebrew/anaconda3/lib/python3.12/site-packages (1.26.4)
Requirement already satisfied: pandas in /opt/homebrew/anaconda3/lib/python3.12/site-packages (2.2.2)
Requirement already satisfied: matplotlib in /opt/homebrew/anaconda3/lib/python3.12/site-packages (3.9.2)
Requirement already satisfied: seaborn in /opt/homebrew/anaconda3/lib/python3.12/site-packages (0.13.2)
Requirement already satisfied: scikit-learn in /opt/homebrew/anaconda3/lib/python3.12/site-packages (1.5.1)
Requirement already satisfied: PyQt6 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (6.9.1)
Requirement already satisfied: ipykernel in /opt/homebrew/anaconda3/lib/python3.12/site-packages (6.28.0)
Requirement already satisfied: scipy<1.16,>=1.4.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (1.13.1)
Requirement already satisfied: pydantic>=2 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (2.8.2)
Requirement already satisfied: PyYAML<6.1,>=5.0.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (6.0.1)
Requirement already satisfied: jinja2<3.2,>=2.11.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (3.1.4)
Requirement already satisfied: visions<0.8.2,>=0.7.5 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from visions[type_image_path]<0.8.2,>=0.7.5->ydata_profiling) (0.8.1)
Requirement already satisfied: htmlmin==0.1.12 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (0.1.12)
Requirement already satisfied: phik<0.13,>=0.11.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (0.12.4)
Requirement already satisfied: requests<3,>=2.24.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (2.32.3)
Requirement already satisfied: tqdm<5,>=4.48.2 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (4.66.5)
Requirement already satisfied: multimethod<2,>=1.4 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (1.12)
Requirement already satisfied: statsmodels<1,>=0.13.2 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (0.14.2)
Requirement already satisfied: typeguard<5,>=3 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (4.4.3)
Requirement already satisfied: imagehash==4.3.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (4.3.1)
Requirement already satisfied: wordcloud>=1.9.3 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (1.9.4)
Requirement already satisfied: dacite>=1.8 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (1.9.2)
Requirement already satisfied: numba<=0.61,>=0.56.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ydata_profiling) (0.60.0)
Requirement already satisfied: PyWavelets in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from imagehash==4.3.1->ydata_profiling) (1.7.0)
Requirement already satisfied: pillow in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from imagehash==4.3.1->ydata_profiling) (10.4.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pandas) (2023.3)
Requirement already satisfied: contourpy>=1.0.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from matplotlib) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: packaging>=20.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from matplotlib) (24.1)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from matplotlib) (3.1.2)
Requirement already satisfied: joblib>=1.2.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from scikit-learn) (3.5.0)
Requirement already satisfied: PyQt6-sip<14,>=13.8 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from PyQt6) (13.10.2)
Requirement already satisfied: PyQt6-Qt6<6.10.0,>=6.9.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from PyQt6) (6.9.1)
Requirement already satisfied: appnope in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (0.1.3)
Requirement already satisfied: comm>=0.1.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (0.2.1)
Requirement already satisfied: debugpy>=1.6.5 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (1.6.7)
Requirement already satisfied: ipython>=7.23.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (8.27.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (8.6.0)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (5.7.2)
Requirement already satisfied: matplotlib-inline>=0.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (0.1.6)
Requirement already satisfied: nest-asyncio in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (1.6.0)
Requirement already satisfied: psutil in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (5.9.0)
Requirement already satisfied: pyzmq>=24 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (25.1.2)
Requirement already satisfied: tornado>=6.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (6.4.1)
Requirement already satisfied: traitlets>=5.4.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipykernel) (5.14.3)
Requirement already satisfied: decorator in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipython>=7.23.1->ipykernel) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipython>=7.23.1->ipykernel) (0.19.1)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipython>=7.23.1->ipykernel) (3.0.43)
Requirement already satisfied: pygments>=2.4.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipython>=7.23.1->ipykernel) (2.15.1)
Requirement already satisfied: stack-data in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipython>=7.23.1->ipykernel) (0.2.0)
Requirement already satisfied: pexpect>4.3 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from ipython>=7.23.1->ipykernel) (4.8.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from jinja2<3.2,>=2.11.1->ydata_profiling) (2.1.3)
Requirement already satisfied: platformdirs>=2.5 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel) (3.10.0)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from numba<=0.61,>=0.56.0->ydata_profiling) (0.43.0)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pydantic>=2->ydata_profiling) (0.6.0)
Requirement already satisfied: pydantic-core==2.20.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pydantic>=2->ydata_profiling) (2.20.1)
Requirement already satisfied: typing-extensions>=4.6.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pydantic>=2->ydata_profiling) (4.14.0)
Requirement already satisfied: six>=1.5 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from requests<3,>=2.24.0->ydata_profiling) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from requests<3,>=2.24.0->ydata_profiling) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from requests<3,>=2.24.0->ydata_profiling) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from requests<3,>=2.24.0->ydata_profiling) (2024.8.30)
Requirement already satisfied: patsy>=0.5.6 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from statsmodels<1,>=0.13.2->ydata_profiling) (0.5.6)
Requirement already satisfied: attrs>=19.3.0 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from visions<0.8.2,>=0.7.5->visions[type_image_path]<0.8.2,>=0.7.5->ydata_profiling) (23.1.0)
Requirement already satisfied: networkx>=2.4 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from visions<0.8.2,>=0.7.5->visions[type_image_path]<0.8.2,>=0.7.5->ydata_profiling) (3.3)
Requirement already satisfied: puremagic in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from visions<0.8.2,>=0.7.5->visions[type_image_path]<0.8.2,>=0.7.5->ydata_profiling) (1.29)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel) (0.7.0)
Requirement already satisfied: wcwidth in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel) (0.2.5)
Requirement already satisfied: executing in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from stack-data->ipython>=7.23.1->ipykernel) (0.8.3)
Requirement already satisfied: asttokens in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from stack-data->ipython>=7.23.1->ipykernel) (2.0.5)
Requirement already satisfied: pure-eval in /opt/homebrew/anaconda3/lib/python3.12/site-packages (from stack-data->ipython>=7.23.1->ipykernel) (0.2.2)
Note: you may need to restart the kernel to use updated packages.
First we import all the Python libraries we are going to use in this section and more. And also using Pandas, we are going to read the data from the CSV file.
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from ydata_profiling import ProfileReport
from sklearn.preprocessing import LabelEncoder
data = pd.read_csv('weatherAUS.csv')Reading raw data and illustrating information about unique values, distributions, missing variables, their types and so on with ProfileReporter (it is very good tool for gaining first insights). This illustration also shows some alerts which our mission is solving them as much as possible. You may want to look at Alerts section to compare after finishing EDA.
ydata-profiling is a package for data profiling, which automates and standardizes the generation of detailed reports, complete with statistics and visualization. As we know the quality of Data Profiling is a key in the process of Data Science and Machine Learning development. The whole importance of this package is the quality of the data analysis and visualisation it gives back with only a single line of code.
ydata-profiling is a great tool to use as with single implementation it supports Exploratory Data Analysis (EDA), provides comprehensive inights, enhances data quality, data exploratory for large datasest and much more. But how does this package work?
In the code below, we using the ProfileReporter, we will generate a detailed report. To do that we first check if the report already exists in this folder (if it was generated when the code has run before, so we don't spend time again regenerating it). If no, Profile report will generate it.
report_file = 'weatherAUS_report.html'
if not os.path.exists(report_file):
profile = ProfileReport(data, title = 'Weather in Australia - Report')
profile.to_file('weatherAUS_report_1.html')
profile.dump('report_1')
profile = ProfileReport().load('report_1.pp')
profile.to_notebook_iframe(){"model_id":"80d20e8e5b1541239f8a031ddaac41fc","version_major":2,"version_minor":0}100%|██████████| 23/23 [00:01<00:00, 21.12it/s]
{"model_id":"fa4b4e541dfb4d3b83c9fa740b1b4a11","version_major":2,"version_minor":0}{"model_id":"fcd80a6dfa314efd85f241bf49b7cd78","version_major":2,"version_minor":0}{"model_id":"d786bb29d56146b9abb17356de4cfa38","version_major":2,"version_minor":0}{"model_id":"effcb8290b2b4828bb6c5041fbcb9a9a","version_major":2,"version_minor":0}